translate 100
Teaching AI to translate 100s of spoken and written languages in real time
For people who understand languages like English, Mandarin, or Spanish, it may seem like today's apps and web tools already provide the translation technology we need. But billions of people are being left out -- unable to easily access most of the information on the internet or connect with most of the online world in their native language. Today's machine translation (MT) systems are improving rapidly, but they still rely heavily on learning from large amounts of textual data, so they do not generally work well for low-resource languages, i.e., languages that lack training data, and for languages that don't have a standardized writing system. Eliminating language barriers would be profound, making it possible for billions of people to access information online in their native or preferred languages. Advances in MT won't just help those people who don't speak one of the languages that dominates the internet today; they'll also fundamentally change the way people in the world connect and share ideas.
Introducing the First AI Model That Translates 100 Languages Without Relying on English
Next, we introduced a new bridge mining strategy, in which we group languages into 14 language groups based on linguistic classification, geography, and cultural similarities. People living in countries with languages of the same family tend to communicate more often and would benefit from high-quality translations. For instance, one group would include languages spoken in India, like Bengali, Hindi, Marathi, Nepali, Tamil, and Urdu. To connect the languages of different groups, we identified a small number of bridge languages, which are usually one to three major languages of each group. In the example above, Hindi, Bengali, and Tamil would be bridge languages for Indo-Aryan languages.